ome.tif image via
RBioFormats package,Biotechnology.zip, which has already been provided on
Canvas.The data described in this document stems from a new biotechnology: molecule-resolved spatial genomics. In particular we will explore data that has been generated by 10x Genomics Xenium instrument on a fresh frozen mouse brain coronal section - Tiny subset. The technology results in several outputs including:
The full data and description can be found in this link.
For this reproducible code we assume that this RMarkdown document is saved within the following directory structure:
Biotechnology/
data_processed/
clusters.csvcell_boundaries.csv.gzmorphology_focus.tifdata_raw/
Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip<unzipped files>scripts/
DATA3888_Biotechnology_generateImages_2024.Rmd (this
document)For you to be able to fully re-run this code you will need to
download the contents of data_raw/ separately (see next
section).
You are provided this directory structure, with the contents of
data_raw/ removed due to the large file size.
The contents to
Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs folder
is from the
Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip
file (approx 3.5GB), available to download via this LINK,
or can be programmatically downloaded using wget into the
target directory.
wget https://cf.10xgenomics.com/samples/xenium/1.0.2/Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP/Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip ../data_raw/
unzip ../data_raw/Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip -d ../data_raw/
Note! It is very important to ensure you are working from the
correct working directory, i.e. within the scripts
folder in the directory structure described above.
EBImage packageEBImage is an R package that is available in the Bioconductor Project. Bioconductor
is similar to the Comprehensive R
Archive Network (CRAN), in that you can install packages from this
repository.
EBImage
provides general purpose functionality for image processing and
analysis. In the context of (high-throughput) microscopy-based cellular
assays, EBImage offers tools to segment cells and extract quantitative
cellular descriptors. This allows the automation of such tasks using the
R programming language and facilitates the use of other tools in the R
environment for signal processing, statistical modeling, machine
learning and visualization with image data.
This chapter
in Modern Statistics for Modern Biology is a great reference for
using EBImage for different types of imaging data.
To install the EBImage package, you can run the chunk
below. This will check whether you have the BiocManager
package installed, and if not it will install BiocManager.
Then, the EBImage package will be installed via the
BiocManager::install() function.
Note: if you attempt to run install.packages("EBImage")
you may be met with an error! This is because the package is available
in Bioconductor and not on CRAN.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("EBImage")
.ome.tif to .tif formatThe raw data bundle contains the cell morphology image in the ome.tif
file format. This type of file includes the image pixel intensities as
well as additional metadata that is associated with the microscopy
experiment. Since we are interested in the image information only, we
need to convert to a .tif format to enable further
downstream processing with EBImage.
You are given the .tif file in the Processed Data
Bundle, but you can see how this was generated in the rest of this
section.
Load the EBImage R package.
library(EBImage)
Read in morphology focus .ome.tif image and export out
as a .tif into the ../data/ folder. Only do so
if the target .tif file does not exist.
Note that if we need to generate the .tif file, we need
to first set up the java memory to 10GB and load the
RBioFormats package, which is available on Bioconductor
development branch and on Github.
tifFile = "../data_processed/morphology_focus.tif"
if (!file.exists(tifFile)) {
options(java.parameters = "-Xmx10g")
library(RBioFormats)
checkJavaMemory()
img_ome = RBioFormats::read.image("../data_raw/morphology_focus.ome.tif", read.metadata = FALSE,
normalize = TRUE)
img = img_ome[[1]]@.Data
EBImage::writeImage(x = img, files = tifFile, type = "tiff")
}
../data_processed/ directorySince the raw data bundle contains many large files, for convenience
we have copied two files from the ../data_raw/ directory to
the ../data_processed/ directory. This can be done
programmatically using the system() function.
system("cp ../data_raw/cell_boundaries.csv.gz ../data_processed/cell_boundaries.csv.gz")
system("cp ../data_raw/analysis/clustering/gene_expression_graphclust/clusters.csv ../data_processed/clusters.csv")
Read and display the morphology image. Display requires some scaling of the intensities according to the distribution of the intensities, to the 99th percentile.
img = EBImage::readImage(tifFile)
EBImage::display(img/quantile(img, 0.99))
Cell segmentation is provided in the data bundle as a
.csv file containing the vertices around each cells’
boundary. The coordinates of the boundaries need to be converted between
micrometres (um) and pixels. This scaling factor can be found in the
../data_raw/experiment.xenium file under “pixel_size”.
Note that read.csv can read a Gzip compressed file.
cell_boundaries = read.csv("../data_processed/cell_boundaries.csv.gz", header = TRUE)
cell_boundaries$vertex_x_trans = cell_boundaries$vertex_x/0.2125
cell_boundaries$vertex_y_trans = cell_boundaries$vertex_y/0.2125
head(cell_boundaries)
## cell_id vertex_x vertex_y vertex_x_trans vertex_y_trans
## 1 1 1901.875 2526.413 8950 11889
## 2 1 1901.450 2537.038 8948 11939
## 3 1 1900.175 2539.375 8942 11950
## 4 1 1896.562 2539.800 8925 11952
## 5 1 1885.938 2537.887 8875 11943
## 6 1 1882.963 2542.775 8861 11966
The imaging data also contains gene expression information, that has
been used to perform graph-based clustering. We read this data in via
read.csv.
clusters = read.csv("../data_processed/clusters.csv")
head(clusters)
## Barcode Cluster
## 1 1 6
## 2 2 6
## 3 3 2
## 4 4 4
## 5 5 6
## 6 6 4
ncells = nrow(clusters)
ncells
## [1] 36553
In this code chunk, we extract morphology images for 1,000 random cells. For each cell, we subset the morphology image to the rectangle of pixels that cover the cell segmentation boundary.
To get a sense of the variety of cell morphologies, we visualise the first five randomly selected cells.
set.seed(2024)
ncells_subset = 1000
cells_subset = sample(ncells, ncells_subset)
table(clusters[cells_subset, "Cluster"], useNA = "always")
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 75 89 91 64 62 65 53 40 26 37 45 33 34 41 23 35
## 17 18 19 20 21 22 23 24 25 26 27 28 <NA>
## 19 17 18 24 17 21 16 11 12 12 11 9 0
for (i in cells_subset) {
# extract the boundary vertices for the selected cell
bounds_i = subset(cell_boundaries, cell_id == i)
# extract the cluster value for the selected cell
clustval_i = clusters[i, "Cluster"]
# extract the pixel intensities for the area covering the cell boundary
img_sub = img[min(bounds_i$vertex_x_trans):max(bounds_i$vertex_x_trans), min(bounds_i$vertex_y_trans):max(bounds_i$vertex_y_trans)]
# normalise the pixel intensities according to 99th percentile
img_sub_norm = img_sub/quantile(img_sub, 0.99)
# as an example, display the image for the first selected cell
if (i %in% cells_subset[1:5]) {
print(paste0("displaying image for cell ", i))
EBImage::display(img_sub/quantile(img_sub, 0.99))
}
# if there is no folder for cell_images, create one
if (!file.exists("../data_processed/cell_images/")) {
system("mkdir ../data_processed/cell_images/")
}
# if there is no folder for the cluster, create one
clustval_i_directory = paste0("../data_processed/cell_images/cluster_", clustval_i)
if (!file.exists(clustval_i_directory)) {
system(paste0("mkdir ", clustval_i_directory))
}
# save the extracted image as a png file
EBImage::writeImage(x = img_sub_norm, files = paste0(clustval_i_directory, "/cell_",
i, ".png"), type = "png")
}
## [1] "displaying image for cell 21029"
## [1] "displaying image for cell 19872"
## [1] "displaying image for cell 7802"
## [1] "displaying image for cell 33803"
## [1] "displaying image for cell 19362"
cell_images.zip processed data fileThe contents of the data_processed/cell_images folder
can then zipped into a file to be shared separately, with the following
commands in the terminal. The first command changes the working
directory to ../data_processed/ and the next command
creates the cell_images.zip file, containing all the
contents of the cell_images/ folder.
cd ../data_processed/
zip -r cell_images.zip cell_images/*
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.5.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] EBImage_4.43.0
##
## loaded via a namespace (and not attached):
## [1] cli_3.6.1 knitr_1.43 rlang_1.1.1
## [4] xfun_0.39 highr_0.10 tiff_0.1-11
## [7] png_0.1-8 jsonlite_1.8.5 RCurl_1.98-1.12
## [10] htmltools_0.5.5 formatR_1.14 sass_0.4.6
## [13] locfit_1.5-9.8 rmarkdown_2.22 grid_4.3.1
## [16] evaluate_0.21 jquerylib_0.1.4 abind_1.4-5
## [19] bitops_1.0-7 fastmap_1.1.1 yaml_2.3.7
## [22] compiler_4.3.1 htmlwidgets_1.6.2 fftwtools_0.9-11
## [25] rstudioapi_0.14 lattice_0.21-8 digest_0.6.31
## [28] R6_2.5.1 bslib_0.5.0 tools_4.3.1
## [31] jpeg_0.1-10 BiocGenerics_0.47.0 cachem_1.0.8
knitr::knit_exit()